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■ Success in the quest for artificial 
intelligence has the potential to bring 
unprecedented benefits to humanity, 
and it is therefore worthwhile to inves¬ 
tigate how to maximize these benefits 
while avoiding potential pitfalls. This 
article gives numerous examples (which 
should by no means be construed as an 
exhaustive list) of such worthwhile 
research aimed at ensuring that Al 
remains robust and beneficial. 


A rtificial intelligence (AI) research has explored a variety 
of problems and approaches since its inception, but for 
the last 20 years or so has been focused on the prob¬ 
lems surrounding the construction of intelligent agents — 
systems that perceive and act in some environment. In this 
context, the criterion for intelligence is related to statistical 
and economic notions of rationality — colloquially, the abil¬ 
ity to make good decisions, plans, or inferences. The adop¬ 
tion of probabilistic representations and statistical learning 
methods has led to a large degree of integration and cross¬ 
fertilization between AI, machine learning, statistics, control 
theory, neuroscience, and other fields. The establishment of 
shared theoretical frameworks, combined with the availabil¬ 
ity of data and processing power, has yielded remarkable suc¬ 
cesses in various component tasks such as speech recogni¬ 
tion, image classification, autonomous vehicles, machine 
translation, legged locomotion, and question-answering sys¬ 
tems. 
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As capabilities in these areas and others cross the 
threshold from laboratory research to economically 
valuable technologies, a virtuous cycle takes hold 
whereby even small improvements in performance 
have significant economic value, prompting greater 
investments in research. There is now a broad con¬ 
sensus that AI research is progressing steadily, and 
that its impact on society is likely to increase. The 
potential benefits are huge, since everything that civ¬ 
ilization has to offer is a product of human intelli¬ 
gence; we cannot predict what we might achieve 
when this intelligence is magnified by the tools AI 
may provide, but the eradication of disease and 
poverty is not unfathomable. Because of the great 
potential of AI, it is valuable to investigate how to 
reap its benefits while avoiding potential pitfalls. 

Progress in AI research makes it timely to focus 
research not only on making AI more capable, but 
also on maximizing the societal benefit of AI. Such 
considerations motivated the AAAI 2008-09 Presi¬ 
dential Panel on Long-Term AI Futures (Horvitz and 
Selman 2009) and other projects and community 
efforts on AI's future impacts. These constitute a sig¬ 
nificant expansion of the field of AI itself, which up 
to now has focused largely on techniques that are 
neutral with respect to purpose. The present docu¬ 
ment can be viewed as a natural continuation of 
these efforts, focusing on identifying research direc¬ 
tions that can help maximize the societal benefit of 
AI. This research is by necessity interdisciplinary, 
because it involves both society and AI. It ranges 
from economics, law, and philosophy to computer 
security, formal methods, and, of course, various 
branches of AI itself. The focus is on delivering AI 
that is beneficial to society and robust in the sense 
that the benefits are guaranteed: our AI systems must 
do what we want them to do. 

This article was drafted with input from the atten¬ 
dees of the 2015 conference The Future of AI: Oppor¬ 
tunities and Challenges (see Acknowledgements), 
and was the basis for an open letter that has collect¬ 
ed nearly 7000 signatures in support of the research 
priorities outlined here. 

Short-Term Research Priorities 

Short-term research priorities including optimizing 
AI's economic impact, research in law and ethics, and 
computer science research for robust AI. In this sec¬ 
tion, each of these priorities will, in turn, be dis¬ 
cussed. 

Optimizing AI's Economic Impact 

The successes of industrial applications of AI, from 
manufacturing to information services, demonstrate 
a growing impact on the economy, although there is 
disagreement about the exact nature of this impact 
and on how to distinguish between the effects of AI 
and those of other information technologies. Many 


economists and computer scientists agree that there 
is valuable research to be done on how to maximize 
the economic benefits of AI while mitigating adverse 
effects, which could include increased inequality and 
unemployment (Mokyr 2014; Brynjolfsson and 
McAfee 2014; Frey and Osborne 2013; Glaeser 2014; 
Shanahan 2015; Nilsson 1984; Manyika et al. 2013). 
Such considerations motivate a range of research 
directions, spanning areas from economics to psy¬ 
chology. Examples include the following. 

Labor Market Forecasting: 

When and in what order should we expect various 
jobs to become automated (Frey and Osborne 2013)? 
How will this affect the wages of less skilled workers, 
the creative professions, and various kinds of infor¬ 
mation workers? Some have have argued that AI is 
likely to greatly increase the overall wealth of 
humanity as a whole (Brynjolfsson and McAfee 
2014). However, increased automation may push 
income distribution further towards a power law 
(Brynjolfsson, McAfee, and Spence 2014), and the 
resulting disparity may fall disproportionately along 
lines of race, class, and gender; research anticipating 
the economic and societal impact of such disparity 
could be useful. 

Other Market Disruptions 

Significant parts of the economy, including finance, 
insurance, actuarial, and many consumer markets, 
could be susceptible to disruption through the use of 
AI techniques to learn, model, and predict human 
and market behaviors. These markets might be iden¬ 
tified by a combination of high complexity and high 
rewards for navigating that complexity (Manyika et 
al. 2013). 

Policy for Managing Adverse Effects 
What policies could help increasingly automated 
societies flourish? For example, Brynjolfsson and 
McAfee (2014) explore various policies for incen- 
tivizing development of labor-intensive sectors and 
for using Al-generated wealth to support underem¬ 
ployed populations. What are the pros and cons of 
interventions such as educational reform, appren¬ 
ticeship programs, labor-demanding infrastructure 
projects, and changes to minimum wage law, tax 
structure, and the social safety net (Glaeser 2014)? 
History provides many examples of subpopulations 
not needing to work for economic security, ranging 
from aristocrats in antiquity to many present-day cit¬ 
izens of Qatar. What societal structures and other fac¬ 
tors determine whether such populations flourish? 

Unemployment is not the same as leisure, and there 
are deep links between unemployment and unhappi¬ 
ness, self-doubt, and isolation (Hetschko, Knabe, and 
Schob 2014; Clark and Oswald 1994); understanding 
what policies and norms can break these links could 
significantly improve the median quality of life. 
Empirical and theoretical research on topics such as 
the basic income proposal could clarify our options 
(Van Parijs 1992; Widerquist et al. 2013). 
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Economic Measures 

It is possible that economic measures such as real 
GDP per capita do not accurately capture the benefits 
and detriments of heavily Al-and-automation-based 
economies, making these metrics unsuitable for pol¬ 
icy purposes (Mokyr 2014). Research on improved 
metrics could be useful for decision making. 

Law and Ethics Research 

The development of systems that embody significant 
amounts of intelligence and autonomy leads to 
important legal and ethical questions whose answers 
affect both producers and consumers of AI technolo¬ 
gy. These questions span law, public policy, profes¬ 
sional ethics, and philosophical ethics, and will 
require expertise from computer scientists, legal 
experts, political scientists, and ethicists. For exam¬ 
ple: 

Liability and Law for Autonomous Vehicles 
If self-driving cars cut the roughly 40,000 annual U.S. 
traffic fatalities in half, the car makers might get not 
20,000 thank-you notes, but 20,000 lawsuits. In what 
legal framework can the safety benefits of 
autonomous vehicles such as drone aircraft and self¬ 
driving cars best be realized (Vladeck 2014)? Should 
legal questions about AI be handled by existing (soft¬ 
ware- and Internet-focused) cyberlaw, or should they 
be treated separately (Calo 2014b)? In both military 
and commercial applications, governments will need 
to decide how best to bring the relevant expertise to 
bear; for example, a panel or committee of profes¬ 
sionals and academics could be created, and Calo has 
proposed the creation of a Federal Robotics Commis¬ 
sion (Calo 2014a). 

Machine Ethics 

How should an autonomous vehicle trade off, say, a 
small probability of injury to a human against the 
near certainty of a large material cost? How should 
lawyers, ethicists, and policymakers engage the pub¬ 
lic on these issues? Should such trade-offs be the sub¬ 
ject of national standards? 

Autonomous Weapons 

Can lethal autonomous weapons be made to comply 
with humanitarian law (Churchill and Ulfstein 
2000)? If, as some organizations have suggested, 
autonomous weapons should be banned (Docherty 
2012), is it possible to develop a precise definition of 
autonomy for this purpose, and can such a ban prac¬ 
tically be enforced? If it is permissible or legal to use 
lethal autonomous weapons, how should these 
weapons be integrated into the existing command- 
and-control structure so that responsibility and lia¬ 
bility remain associated with specific human actors? 
What technical realities and forecasts should inform 
these questions, and how should meaningful human 
control over weapons be defined (Roff 2013, 2014; 
Anderson, Reisner, and Waxman 2014)? Are 
autonomous weapons likely to reduce political aver¬ 
sion to conflict, or perhaps result in accidental battles 


or wars (Asaro 2008)? Would such weapons become 
the tool of choice for oppressors or terrorists? Final¬ 
ly, how can transparency and public discourse best 
be encouraged on these issues? 

Privacy 

How should the ability of AI systems to interpret the 
data obtained from surveillance cameras, phone 
lines, emails, and so on, interact with the right to pri¬ 
vacy? How will privacy risks interact with cybersecu¬ 
rity and cyberwarfare (Singer and Friedman 2014)? 
Our ability to take full advantage of the synergy 
between AI and big data will depend in part on our 
ability to manage and preserve privacy (Manyika et 
al. 2011; Agrawal and Srikant 2000). 

Professional Ethics 

What role should computer scientists play in the law 
and ethics of AI development and use? Past and cur¬ 
rent projects to explore these questions include the 
AAAI 2008-09 Presidential Panel on Long-Term AI 
Futures (Horvitz and Selman 2009), the EPSRC Prin¬ 
ciples of Robotics (Boden et al. 2011), and recently 
announced programs such as Stanford's One-Hun¬ 
dred Year Study of AI and the AAAI Committee on AI 
Impact and Ethical Issues. 

Policy Questions 

From a public policy perspective, AI (like any power¬ 
ful new technology) enables both great new benefits 
and novel pitfalls to be avoided, and appropriate 
policies can ensure that we can enjoy the benefits 
while risks are minimized. This raises policy ques¬ 
tions such as (1) What is the space of policies worth 
studying, and how might they be enacted? (2) 
Which criteria should be used to determine the mer¬ 
its of a policy? Candidates include verifiability of 
compliance, enforceability, ability to reduce risk, 
ability to avoid stifling desirable technology devel¬ 
opment, likelihood of being adoped, and ability to 
adapt over time to changing circumstances. 

Computer Science Research for Robust AI 

As autonomous systems become more prevalent in 
society, it becomes increasingly important that they 
robustly behave as intended. The development of 
autonomous vehicles, autonomous trading systems, 
autonomous weapons, and so on, has therefore 
stoked interest in high-assurance systems where 
strong robustness guarantees can be made; Weld and 
Etzioni (1994) have argued that "society will reject 
autonomous agents unless we have some credible 
means of making them safe." Different ways in 
which an AI system may fail to perform as desired 
correspond to different areas of robustness research: 

Verification: How to prove that a system satisfies 
certain desired formal properties. (Did I build the sys¬ 
tem right?) 

Validity: How to ensure that a system that meets 
its formal requirements does not have unwanted 
behaviors and consequences. (Did I build the right 
system?) 
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Security: How to prevent intentional manipulation 
by unauthorized parties. 

Control: How to enable meaningful human control 
over an AI system after it begins to operate. (OK, I 
built the system wrong; can I fix it?) 

Verification 

By verification, we mean methods that yield high 
confidence that a system will satisfy a set of formal 
constraints. When possible, it is desirable for systems 
in safety-critical situations, for example, self-driving 
cars, to be verifiable. 

Formal verification of software has advanced sig¬ 
nificantly in recent years: examples include the seL4 
kernel (Klein et al. 2009), a complete, general-pur¬ 
pose operating system kernel that has been mathe¬ 
matically checked against a formal specification to 
give a strong guarantee against crashes and unsafe 
operations, and HACMS, DARPA's "clean-slate, for¬ 
mal methods-based approach" to a set of high-assur- 
ance software tools (Fisher 2012). Not only should it 
be possible to build AI systems on top of verified sub¬ 
strates; it should also be possible to verify the designs 
of the AI systems themselves, particularly if they fol¬ 
low a componentized architecture, in which guaran¬ 
tees about individual components can be combined 
according to their connections to yield properties of 
the overall system. This mirrors the agent architec¬ 
tures used in Russell and Norvig (2010), which sepa¬ 
rate an agent into distinct modules (predictive mod¬ 
els, state estimates, utility functions, policies, 
learning elements, and others), and has analogues in 
some formal results on control system designs. 
Research on richer kinds of agents — for example, 
agents with layered architectures, anytime compo¬ 
nents, overlapping deliberative and reactive ele¬ 
ments, metalevel control, and so on — could con¬ 
tribute to the creation of verifiable agents, but we 
lack the formal algebra to properly define, explore, 
and rank the space of designs. 

Perhaps the most salient difference between verifi¬ 
cation of traditional software and verification of AI sys¬ 
tems is that the correctness of traditional software is 
defined with respect to a fixed and known machine 
model, whereas AI systems — especially robots and 
other embodied systems — operate in environments 
that are at best partially known by the system design¬ 
er. In these cases, it may be practical to verify that the 
system acts correctly given the knowledge that it has, 
avoiding the problem of modelling the real environ¬ 
ment (Dennis et al. 2013). A lack of design-time knowl¬ 
edge also motivates the use of learning algorithms 
within the agent software, and verification becomes 
more difficult: statistical learning theory gives so-called 
s-8 (probably approximately correct) bounds, mostly 
for the somewhat unrealistic settings of supervised 
learning from i.i.d. data and single-agent reinforce¬ 
ment learning with simple architectures and full 
observability, but even then requiring prohibitively 
large sample sizes to obtain meaningful guarantees. 


Work in adaptive control theory (Astrom and Wit- 
tenmark 2013), the theory of so-called cyberphysical 
systems (Platzer 2010), and verification of hybrid or 
robotic systems (Alur 2011; Winfield, Blum, and Liu 
2014) is highly relevant but also faces the same diffi¬ 
culties. And of course all these issues are laid on top 
of the standard problem of proving that a given soft¬ 
ware artifact does in fact correctly implement, say, a 
reinforcement learning algorithm of the intended 
type. Some work has been done on verifying neural 
network applications (Pulina and Tacchella 2010; 
Taylor 2006; Schumann and Liu 2010) and the 
notion of partial programs (Andre and Russell 2002; 
Spears 2006) allows the designer to impose arbitrary 
structural constraints on behavior, but much remains 
to be done before it will be possible to have high con¬ 
fidence that a learning agent will learn to satisfy its 
design criteria in realistic contexts. 

Validity 

A verification theorem for an agent design has the 
form, "If environment satisfies assumptions c|> then 
behavior satisfies requirements \|/." There are two 
ways in which a verified agent can, nonetheless, fail 
to be a beneficial agent in actuality: first, the envi¬ 
ronmental assumption c|> is false in the real world, 
leading to behavior that violates the requirements \\i; 
second, the system may satisfy the formal require¬ 
ment \|/ but still behave in ways that we find highly 
undesirable in practice. It may be the case that this 
undesirability is a consequence of satisfying \\i when 
c|> is violated; that is, had c|> held the undesirability 
would not have been manifested; or it may be the 
case that the requirement \\i is erroneous in itself. Rus¬ 
sell and Norvig (2010) provide a simple example: if a 
robot vacuum cleaner is asked to clean up as much 
dirt as possible, and has an action to dump the con¬ 
tents of its dirt container, it will repeatedly dump and 
clean up the same dirt. The requirement should focus 
not on dirt cleaned up but on cleanliness of the floor. 
Such specification errors are ubiquitous in software 
verification, where it is commonly observed that 
writing correct specifications can be harder than writ¬ 
ing correct code. Unfortunately, it is not possible to 
verify the specification: the notions of beneficial and 
desirable are not separately made formal, so one can¬ 
not straightforwardly prove that satisfying \\i neces¬ 
sarily leads to desirable behavior and a beneficial 
agent. 

In order to build systems that robustly behave well, 
we of course need to decide what good behavior 
means in each application domain. This ethical ques¬ 
tion is tied intimately to questions of what engineer¬ 
ing techniques are available, how reliable these tech¬ 
niques are, and what trade-offs can be made — all 
areas where computer science, machine learning, and 
broader AI expertise is valuable. For example, Wal- 
lach and Allen (2008) argue that a significant consid¬ 
eration is the computational expense of different 
behavioral standards (or ethical theories): if a stan- 


108 


AI MAGAZINE 


Articles 


dard cannot be applied efficiently enough to guide 
behavior in safety-critical situations, then cheaper 
approximations may be needed. Designing simplified 
rules — for example, to govern a self-driving car's 
decisions in critical situations — will likely require 
expertise from both ethicists and computer scientists. 
Computational models of ethical reasoning may 
shed light on questions of computational expense 
and the viability of reliable ethical reasoning meth¬ 
ods (Asaro 2006, Sullins 2011). 

Security 

Security research can help make AI more robust. As AI 
systems are used in an increasing number of critical 
roles, they will take up an increasing proportion of 
cyberattack surface area. It is also probable that AI 
and machine-learning techniques will themselves be 
used in cyberattacks. 

Robustness against exploitation at the low level is 
closely tied to verifiability and freedom from bugs. 
For example, the DARPA SAFE program aims to build 
an integrated hardware-software system with a flexi¬ 
ble metadata rule engine, on which can be built 
memory safety, fault isolation, and other protocols 
that could improve security by preventing 
exploitable flaws (DeHon et al. 2011). Such programs 
cannot eliminate all security flaws (since verification 
is only as strong as the assumptions that underly the 
specification), but could significantly reduce vulner¬ 
abilities of the type exploited by the recent Heart- 
bleed and Bash bugs. Such systems could be prefer¬ 
entially deployed in safety-critical applications, 
where the cost of improved security is justified. 

At a higher level, research into specific AI and 
machine-learning techniques may become increas¬ 
ingly useful in security. These techniques could be 
applied to the detection of intrusions (Lane 2000), 
analyzing malware (Rieck et al. 2011), or detecting 
potential exploits in other programs through code 
analysis (Brun and Ernst 2004). It is not implausible 
that cyberattack between states and private actors 
will be a risk factor for harm from near-future AI sys¬ 
tems, motivating research on preventing harmful 
events. As AI systems grow more complex and are 
networked together, they will have to intelligently 
manage their trust, motivating research on statistical- 
behavioral trust establishment (Probst and Kasera 
2007) and computational reputation models (Sabater 
and Sierra 2005). 

Control 

For certain types of safety-critical AI systems — espe¬ 
cially vehicles and weapons platforms — it may be 
desirable to retain some form of meaningful human 
control, whether this means a human in the loop, on 
the loop (Hexmoor, McLaughlan, and Tuli 2009; 
Parasuraman, Sheridan, and Wickens 2000), or some 
other protocol. In any of these cases, there will be 
technical work needed in order to ensure that mean¬ 
ingful human control is maintained (UNIDIR 2014). 

Automated vehicles are a test-bed for effective con¬ 


trol-granting techniques. The design of systems and 
protocols for transition between automated naviga¬ 
tion and human control is a promising area for fur¬ 
ther research. Such issues also motivate broader 
research on how to optimally allocate tasks within 
human-computer teams, both for identifying situa¬ 
tions where control should be transferred, and for 
applying human judgment efficiently to the highest- 
value decisions. 

Long-Term Research Priorities 

A frequently discussed long-term goal of some AI 
researchers is to develop systems that can learn from 
experience with humanlike breadth and surpass 
human performance in most cognitive tasks, thereby 
having a major impact on society. If there is a non- 
negligible probability that these efforts will succeed 
in the foreseeable future, then additional current 
research beyond that mentioned in the previous sec¬ 
tions will be motivated as exemplified next, to help 
ensure that the resulting AI will be robust and bene¬ 
ficial. 

Assessments of this success probability vary wide¬ 
ly between researchers, but few would argue with 
great confidence that the probability is negligible, 
given the track record of such predictions. For exam¬ 
ple, Ernest Rutherford, arguably the greatest nuclear 
physicist of his time, said in 1933 — less than 24 
hours before Szilard's invention of the nuclear chain 
reaction — that nuclear energy was "moonshine" 
(Press 1933), and astronomer Royal Richard Woolley 
called interplanetary travel "utter bilge" in 1956 
(Reuters 1956). Moreover, to justify a modest invest¬ 
ment in this AI robustness research, this probability 
need not be high, merely nonnegligible, just as a 
modest investment in home insurance is justified by 
a nonnegligible probability of the home burning 
down. 

Verification 

Reprising the themes of short-term research, research 
enabling verifiable low-level software and hardware 
can eliminate large classes of bugs and problems in 
general AI systems; if such systems become increas¬ 
ingly powerful and safety-critical, verifiable safety 
properties will become increasingly valuable. If the 
theory of extending verifiable properties from com¬ 
ponents to entire systems is well understood, then 
even very large systems can enjoy certain kinds of 
safety guarantees, potentially aided by techniques 
designed explicitly to handle learning agents and 
high-level properties. Theoretical research, especial¬ 
ly if it is done explicitly with very general and capa¬ 
ble AI systems in mind, could be particularly useful. 

A related verification research topic that is dis¬ 
tinctive to long-term concerns is the verifiability of 
systems that modify, extend, or improve themselves, 
possibly many times in succession (Good 1965, 


WINTER 2015 109 


Articles 


Vinge 1993). Attempting to straightforwardly apply 
formal verification tools to this more general setting 
presents new difficulties, including the challenge 
that a formal system that is sufficiently powerful can¬ 
not use formal methods in the obvious way to gain 
assurance about the accuracy of functionally similar 
formal systems, on pain of inconsistency through 
Godel's incompleteness (Fallenstein and Soares 2014; 
Weaver 2013). It is not yet clear whether or how this 
problem can be overcome, or whether similar prob¬ 
lems will arise with other verification methods of 
similar strength. 

Finally, it is often difficult to actually apply for¬ 
mal verification techniques to physical systems, 
especially systems that have not been designed with 
verification in mind. This motivates research pursu¬ 
ing a general theory that links functional specifica¬ 
tion to physical states of affairs. This type of theory 
would allow use of formal tools to anticipate and 
control behaviors of systems that approximate 
rational agents, alternate designs such as satisficing 
agents, and systems that cannot be easily described 
in the standard agent formalism (powerful predic¬ 
tion systems, theorem provers, limited-purpose sci¬ 
ence or engineering systems, and so on). It may also 
be that such a theory could allow rigorous demon¬ 
strations that systems are constrained from taking 
certain kinds of actions or performing certain kinds 
of reasoning. 

Validity 

As in the short-term research priorities, validity is 
concerned with undesirable behaviors that can arise 
despite a system's formal correctness. In the long 
term, AI systems might become more powerful and 
autonomous, in which case failures of validity could 
carry correspondingly higher costs. 

Strong guarantees for machine-learning methods, 
an area we highlighted for short-term validity 
research, will also be important for long-term safety. 
To maximize the long-term value of this work, 
machine-learning research might focus on the types 
of unexpected generalization that would be most 
problematic for very general and capable AI systems. 
In particular, it might aim to understand theoretical¬ 
ly and practically how learned representations of 
high-level human concepts could be expected to gen¬ 
eralize (or fail to) in radically new contexts (Tegmark 
2015). Additionally, if some concepts could be 
learned reliably, it might be possible to use them to 
define tasks and constraints that minimize the 
chances of unintended consequences even when 
autonomous AI systems become very general and 
capable. Little work has been done on this topic, 
which suggests that both theoretical and experimen¬ 
tal research may be useful. 

Mathematical tools such as formal logic, probabil¬ 
ity, and decision theory have yielded significant 
insight into the foundations of reasoning and deci¬ 


sion making. However, there are still many open 
problems in the foundations of reasoning and deci¬ 
sion. Solutions to these problems may make the 
behavior of very capable systems much more reliable 
and predictable. Example research topics in this area 
include reasoning and decision under bounded com¬ 
putational resources a la Horvitz and Russell (Horvitz 
1987; Russell and Subramanian 1995), how to take 
into account correlations between AI systems' behav¬ 
iors and those of their environments or of other 
agents (Tennenholtz 2004; LaVictoire et al. 2014; 
Hintze 2014; Halpern and Pass 2013; Soares and Fal¬ 
lenstein 2014c), how agents that are embedded in 
their environments should reason (Soares 2014a; 
Orseau and Ring 2012), and how to reason about 
uncertainty over logical consequences of beliefs or 
other deterministic computations (Soares and Fallen¬ 
stein 2014b). These topics may benefit from being 
considered together, since they appear deeply linked 
(Halpern and Pass 2011; Halpern, Pass, and Seeman 
2014). 

In the long term, it is plausible that we will want 
to make agents that act autonomously and powerful¬ 
ly across many domains. Explicitly specifying our 
preferences in broad domains in the style of near¬ 
future machine ethics may not be practical, making 
aligning the values of powerful AI systems with our 
own values and preferences difficult (Soares 2014b, 
Soares and Fallenstein 2014a). 

Consider, for instance, the difficulty of creating a 
utility function that encompasses an entire body of 
law; even a literal rendition of the law is far beyond 
our current capabilities, and would be highly unsatis¬ 
factory in practice (since law is written assuming that 
it will be interpreted and applied in a flexible, case- 
by-case way by humans who, presumably, already 
embody the background value systems that artificial 
agents may lack). Reinforcement learning raises its 
own problems: when systems become very capable 
and general, then an effect similar to Goodhart's Law 
is likely to occur, in which sophisticated agents 
attempt to manipulate or directly control their reward 
signals (Bostrom 2014). This motivates research areas 
that could improve our ability to engineer systems 
that can learn or acquire values at run time. For exam¬ 
ple, inverse reinforcement learning may offer a viable 
approach, in which a system infers the preferences of 
another rational or nearly rational actor by observing 
its behavior (Russell 1998, Ng and Russell 2000). Oth¬ 
er approaches could use different assumptions about 
underlying cognitive models of the actor whose pref¬ 
erences are being learned (Chu and Ghahramani 
2005), or could be explicitly inspired by the way 
humans acquire ethical values. As systems become 
more capable, more epistemically difficult methods 
could become viable, suggesting that research on such 
methods could be useful; for example, Bostrom (2014) 
reviews preliminary work on a variety of methods for 
specifying goals indirectly. 
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Security 

It is unclear whether long-term progress in AI will 
make the overall problem of security easier or hard¬ 
er; on one hand, systems will become increasingly 
complex in construction and behavior and Al-based 
cyberattacks may be extremely effective, while on the 
other hand, the use of AI and machine-learning tech¬ 
niques along with significant progress in low-level 
system reliability may render hardened systems 
much less vulnerable than today's. From a crypto¬ 
graphic perspective, it appears that this conflict 
favors defenders over attackers; this may be a reason 
to pursue effective defense research wholeheartedly. 

Although the topics described in the near-term 
security research section earlier may become increas¬ 
ingly important in the long term, very general and 
capable systems will pose distinctive security prob¬ 
lems. In particular, if the problems of validity and 
control are not solved, it may be useful to create con¬ 
tainers for AI systems that could have undesirable 
behaviors and consequences in less controlled envi¬ 
ronments (Yampolskiy 2012). Both theoretical and 
practical sides of this question warrant investigation. 
If the general case of AI containment turns out to be 
prohibitively difficult, then it may be that designing 
an AI system and a container in parallel is more suc¬ 
cessful, allowing the weaknesses and strengths of the 
design to inform the containment strategy (Bostrom 
2014). The design of anomaly detection systems and 
automated exploit checkers could be of significant 
help. Overall, it seems reasonable to expect this addi¬ 
tional perspective — defending against attacks from 
within a system as well as from external actors — will 
raise interesting and profitable questions in the field 
of computer security. 

Control 

It has been argued that very general and capable AI 
systems operating autonomously to accomplish 
some task will often be subject to effects that increase 
the difficulty of maintaining meaningful human 
control (Omohundro 2007; Bostrom 2012, 2014; 
Shanahan 2015). Research on systems that are not 
subject to these effects, minimize their impact, or 
allow for reliable human control could be valuable in 
preventing undesired consequences, as could work 
on reliable and secure test beds for AI systems at a 
variety of capability levels. 

If an AI system is selecting the actions that best 
allow it to complete a given task, then avoiding con¬ 
ditions that prevent the system from continuing to 
pursue the task is a natural subgoal (Omohundro 
2007, Bostrom 2012) (and conversely, seeking uncon¬ 
strained situations is sometimes a useful heuristic 
[Wissner-Gross and Freer 2013]). This could become 
problematic, however, if we wish to repurpose the 
system, to deactivate it, or to significantly alter its 
decision-making process; such a system would 
rationally avoid these changes. Systems that do not 


exhibit these behaviors have been termed corrigible 
systems (Soares et al. 2015), and both theoretical and 
practical work in this area appears tractable and use¬ 
ful. For example, it may be possible to design utility 
functions or decision processes so that a system will 
not try to avoid being shut down or repurposed 
(Soares et al. 2015), and theoretical frameworks could 
be developed to better understand the space of 
potential systems that avoid undesirable behaviors 
(Hibbard 2012, 2014, 2015). 

It has been argued that another natural subgoal for 
AI systems pursuing a given goal is the acquisition of 
fungible resources of a variety of kinds: for example, 
information about the environment, safety from dis¬ 
ruption, and improved freedom of action are all 
instrumentally useful for many tasks (Omohundro 
2007, Bostrom 2012). Hammond et al. (1995) give 
the label stabilization to the more general set of cas¬ 
es where "due to the action of the agent, the envi¬ 
ronment comes to be better fitted to the agent as 
time goes on." This type of subgoal could lead to 
undesired consequences, and a better understanding 
of the conditions under which resource acquisition 
or radical stabilization is an optimal strategy (or like¬ 
ly to be selected by a given system) would be useful 
in mitigating its effects. Potential research topics in 
this area include domestic goals that are limited in 
scope in some way (Bostrom 2014), the effects of 
large temporal discount rates on resource acquisition 
strategies, and experimental investigation of simple 
systems that display these subgoals. 

Finally, research on the possibility of superintelli- 
gent machines or rapid, sustained self-improvement 
(intelligence explosion) has been highlighted by past 
and current projects on the future of AI as potential¬ 
ly valuable to the project of maintaining reliable con¬ 
trol in the long term. The AAAI 2008-09 Presidential 
Panel on Long-Term AI Futures' Subgroup on Pace, 
Concerns, and Control stated that 

There was overall skepticism about the prospect of an 
intelligence explosion . . . Nevertheless, there was a 
shared sense that additional research would be valu¬ 
able on methods for understanding and verifying the 
range of behaviors of complex computational systems 
to minimize unexpected outcomes. Some panelists 
recommended that more research needs to be done to 
better define “intelligence explosion/' and also to bet¬ 
ter formulate different classes of such accelerating 
intelligences. Technical work would likely lead to 
enhanced understanding of the likelihood of such 
phenomena, and the nature, risks, and overall out¬ 
comes associated with different conceived variants 
(Horvitz and Selman 2009). 

Stanford's One-Hundred Year Study of Artificial Intel¬ 
ligence includes loss of control of AI systems as an 
area of study, specifically highlighting concerns over 
the possibility that 

... we could one day lose control of AI systems via the 
rise of superintelligences that do not act in accordance 
with human wishes — and that such powerful systems 
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would threaten humanity. Are such dystopic out¬ 
comes possible? If so, how might these situations 
arise? ... What kind of investments in research should 
be made to better understand and to address the pos¬ 
sibility of the rise of a dangerous superintelligence or 
the occurrence of an "intelligence explosion"? 
(Horvitz 2014) 

Research in this area could include any of the long¬ 
term research priorities listed previously, as well as 
theoretical and forecasting work on intelligence 
explosion and superintelligence (Chalmers 2010, 
Bostrom 2014), and could extend or critique existing 
approaches begun by groups such as the Machine 
Intelligence Research Institute (Soares and Fallen- 
stein 2014a). 

Conclusion 

In summary, success in the quest for artificial intelli¬ 
gence has the potential to bring unprecedented ben¬ 
efits to humanity, and it is therefore worthwhile to 
research how to maximize these benefits while avoid¬ 
ing potential pitfalls. The research agenda outlined 
in this paper, and the concerns that motivate it, have 
been called anti-AI, but we vigorously contest this 
characterization. It seems self-evident that the grow¬ 
ing capabilities of AI are leading to an increased 
potential for impact on human society. It is the duty 
of AI researchers to ensure that the future impact is 
beneficial. We believe that this is possible, and hope 
that this research agenda provides a helpful step in 
the right direction. 
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